A graph based algorithm for generating EST consensus sequences
نویسندگان
چکیده
MOTIVATION EST sequences constitute an abundant, yet error prone resource for computational biology. Expressed sequences are important in gene discovery and identification, and they are also crucial for the discovery and classification of alternative splicing. An important challenge when processing EST sequences is the reconstruction of mRNA by assembling EST clusters into consensus sequences. RESULTS In contrast to the more established assembly tools, we propose an algorithm that constructs a graph over sequence fragments of fixed size, and produces consensus sequences as traversals of this graph. We provide a tool implementing this algorithm, and perform an experiment where the consensus sequences produced by our implementation, as well as by currently available tools, are compared to mRNA. The results show that our proposed algorithm in a majority of the cases produces consensus of higher quality than the established sequence assemblers and at a competitive speed. AVAILABILITY The source code for the implementation is available under a GPL license from http://www.ii.uib.no/~ketil/bioinformatics/ CONTACT [email protected].
منابع مشابه
Automated Clustering and Assembly of Large EST Collections
The availability of large EST (Expressed Sequence Tag) databases has led to a revolution in the way new genes are cloned. Difficulties arise, however, due to high error rates and redundancy of raw EST data. For these reasons, one of the first tasks performed by a scientist investigating any EST of interest is to gather contiguous ESTs and assemble them into a larger virtual cDNA. The REX (Recur...
متن کاملA Review on Consensus Algorithms in Blockchain
Block chain technology is a decentralized data storage structure based on a chain of data blocks that are related to each other. Block chain saves new blocks in the ledger without trusting intermediaries through a competitive or voting mechanism. Due to the chain structure or the graph between each block with its previous blocks, it is impossible to change blocking data. Block chain architectur...
متن کاملESTminer: a suite of programs for gene and allele identification
UNLABELLED ESTminer is a collection of programs that use expressed sequence tag (EST) data from inbred genomes to identify unique genes within gene families. The algorithm utilizes Cap3 to perform an initial clustering of related EST sequences to produce a consensus sequence of a gene family. These consensus sequences are then used to collect all ESTs in the original EST library that are relate...
متن کاملParallelization of MIRA Whole Genome and EST Sequence Assembler
The genome assembly problem is to generate the original DNA sequence of the organism from a large set of short overlapping fragments. MIRA is an open source assembler based on the Overlap Layout Consensus (OLC) graph model which addresses the assembly problem and is widely used by biologists [1,2]. Like other assemblers MIRA takes a long time to compute the assembly for large number of sequence...
متن کاملSIMULATED ANNEALING ALGORITHM FOR SELECTING SUBOPTIMAL CYCLE BASIS OF A GRAPH
The cycle basis of a graph arises in a wide range of engineering problems and has a variety of applications. Minimal and optimal cycle bases reduce the time and memory required for most of such applications. One of the important applications of cycle basis in civil engineering is its use in the force method to frame analysis to generate sparse flexibility matrices, which is needed for optimal a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 21 8 شماره
صفحات -
تاریخ انتشار 2005